NoSta-D: A Corpus of German Non-Standard Varieties
نویسندگان
چکیده
Until recently, most research in computational linguistics has been done on newspaper texts. Nowadays, the focus has been extended to other types of language data. This means that many linguistic descriptions and automatic tools need to be adapted or extended to non-newspaper language. The non-standard varieties corpus of German (NoSta-D) will provide a first gold standard for evaluation and training data of dependency analysis, named entity recognition and coreference resolution for out-of-domain text types.
منابع مشابه
The Pronouncing Dictionary of Austrian German and the other Major Varieties of German - A Phonetic Resources Database on the Pronunciation of German
The paper gives a comprehensive overview on the project “Varieties of Austrian German Standard pronunciation and varieties of standard pronunciation” whose primary goal is the creation of a pronouncing dictionary of Austrian German and the creation of a large data base of audio samples for research on spoken language and different forms of pronunciation in Austria. The contents of the dictionar...
متن کاملCross-Linguistic Distinctions Between Professional and Non-Professional Speaking Styles
This work investigates acoustic and perceptual differences in four language varieties by using a corpus of professional and non-professional speaking styles. The professional stimuli are composed of excerpts of broadcast news and political discourses from six subjects in each case. The non-professional stimuli are made up of recordings of 10 subjects who read a long story and narrated it subseq...
متن کاملA Comparative Study of Intonation in Three Standard Varieties of German
This paper presents a comparative analysis of declarative intonation produced by standard speakers of German from Austria, Germany and Switzerland. The analysis was based on a directly comparable corpus of speech data. A perception test with phoneticians from the three countries suggested (1) that speakers from the three varieties produce different tunes on accented syllables, and (2) that ther...
متن کاملOn the Realization of Schwa in Two Varieties of Standard German
The present study investigates the realization of the central vowel in Northern Standard German and Austrian Standard German. An acoustic formant analysis of professional and non-professional speakers shows different realizations of this vowel in both varieties (more central vs. more fronted). In contrast to earlier reports a subgroup of nonprofessional Austrian speakers tends to realize a schw...
متن کاملTagging Historical Corpora - the problem of spelling variation
Spelling issues tend to create relatively minor (though still complex) problems for corpus linguistics, information retrieval and natural language processing tasks that use ‘standard’ or modern varieties of English. For example, in corpus annotation, we have to decide how to deal with tokenisation issues such as whether (i) periods represent sentence boundaries or acronyms and (ii) apostrophes ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013